n be seen that the F statistic should be maximised. This is because
ng ܵ
ଶ is equivalent to the maximisation of the F statistic and
ng ܵௐ
ଶ is equivalent to the maximisation of the F statistic as well.
ing different cluster numbers, different F statistic values will be
d. To find the best cluster model structure, it is therefore required
mising the F statistic. This means that the best cluster model
will correspond to the maximised F statistic for a specified
umber.
use it is not easy to determine a good threshold for the F statistic,
is used for the significance analysis. Figure 2.30(a) shows a toy
nine clusters. The distribution of p values is shown in Figure
where 15 models, which employed two, three, till 16 clusters,
ed. If the critical p value was 0.01, the selected model structure
t clusters because it was the first cluster structure with a p value
s than 0.01 in the p value distribution. It was one cluster less than
luster number.
(a) (b)
The K-means simulation for a data with nine clusters. (a) Visualisation of the
dels with seven, eight, nine and ten clusters. (b) The F statistic p value
of the K-means models constructed for the data.